Note: Code sections can be made visible by using the provided buttons throughout this project

1.0 ABSTRACT

Topic

Problem formulation

Research question

Concepts

Dataset and main data analytics methods and tools

Most important results

Conclusions and recommendations

2.0 INTRODUCTION

Since 2008, Airbnb has grown from a small accommodation platform, hosted in San Francisco, to one that is now recognised throughout the world. Airbnb has revolutionized the tourism housing industry by applying a sharing economy model to the accommodation business. Today, Airbnb has become the world’s largest accommodation service provider with more accommodation options than any other accommodation business - and even more than all of them combined. As a platform, Airbnb enables people (hosts) to offer accommodation services to other people (guests), providing guests with a more unique and personalized way of experiencing the world, and often at a reasonably lower price than other accommodation options. Only just a fraction (20%) of these transactions are captured by Airbnb, which in 2019 returned 4,7 billion USD in sales revenues.

2.1 Problem Formulation and Research Question

Data plays a key role in Airbnb’s success. For instance, data enables Airbnb to match guests and hosts and further allows the users to filter the host listings to their likings, in respect of pricing, location, number of beds, and much more. Thereby, data is essential to securing high customer satisfaction. Moreover, Airbnb can use the collected data to extract insights that can be used to improve their service offerings, guide decision making, guide marketing initiatives, and more.

As a platform, Airbnb’s sole value creation lies in creating successful matches between guests and hosts and by ensuring a positive experience for both parties. Naturally, if the platform fails to deliver a positive experience to a user, the user might neglect the platform in total, resulting in negative feedback loops. This leads us to our research question:

How can Airbnb ensure matches and the experiences they create are positive for their customers (users), and providers (hosts)? Moreover, how can Airbnb help guide user decisions to create successful matches and positive experiences?

Currently, Airbnb helps the users to create meaningful matches, by allowing the guests to limit their search for accommodation by different attributes related to the individual host listing. As such, users can easily find accommodation that meets their basic needs for accommodation; e.g. number of beds, bedrooms, price, room type, etc. However, without any knowledge of the different location areas, guests might find difficulty in choosing a location that suits their needs.

In this project, we will examine the accommodation services, listed on Airbnb for Copenhagen, in special regard to the location areas, and the attributes that are associated with them. The goal is to create a report that can guide customers to choose a location that lives up to their expectations, thereby improving the quality of the matches provided by the platform.

3.0 METHODOLOGY

3.1 Dataset Analysis Process

To answer the question of interest, we perform an exploratory data analysis (EDA) to gain an understanding of what features that seperates and defines neighbourhoods and the differences in their accommodation offerings. More specifically, we will zoom in on the neighbourhoods with respect to which type of room and property that are most common in the area, and how the neighbourhood affect the listing price. Moreover, the price for accommodating one person is calculated to provide an indication of wealth.

Moving on, we create an interactive map that displays each individual accommodation offering in a geospatial visualization. Through interaction the map allows the user to easily find listings, view that most expensivest listings and display where each neighbourhood is located. Furthermore, the interactivity enables zooming, moving and filtering of the data to enhance the understanding of the geospace.

Finally, we create wordcloud visualizations to display how hosts are descriping the neighbourhoods, that allows us to get a sense of which words that best describes the location areas.

3.2 Dataset Description

The data was downloaded from the independent site: Inside Airbnb, which scrapes data from Airbnb, and makes it puplicly available for analysis. This site provides a multitude of datasets containing information on the most populated cities around the world - including Copenhagen.

The datasets provided by Inside Airbnb is as follows: (1) listings, (2) calendar, (3) reviews, (4) listings_summary, (5) reviews_summary.

We have downloaded and inspected all of the datasets. However, only the listings are assessed to be important for this project.

The most recent data set is used, which was scraped on 28th of Nov. 2020.

The listings dataset contains data about the airbnb host listings and their respective attributes. In total, there are 74 columns describing 8636 listings on the Airbnb platform. However, for this project, the following 17 attributes has been selected for analysis:

  • id: primary key (listings_id)
  • name: name of listing
  • description: room description
  • neighbourhood_overview: text description of the neighbourhood
  • neightbourhood_cleansed: location area cleaned from special charaters
  • latitude: latitude
  • longitude: longitude
  • property_type: type of the property where the room is in
  • room_type: type of room that is made available
  • accommodates: max number of people that can stay at a time
  • beds: number of beds in the room
  • bedrooms: number of bedrooms in the room
  • ammenties: facilities available
  • price: price of the room per day
  • number_of_reviews: number of times the listing was reviewed
  • review_scores_rating: average review score of the listing

3.3 Preprocessing Steps

As usual, before we can initialize the data exploration, we will need to preprocess the data. Overall the preprocessing will follow the following structure:

  1. Install and Import libraries
  2. Gather data
  3. Data Cleaning

By utilizing the pandas library, we download and unzip the data using pandas built-in decompression tool, and then using it to create dataframes that stores the data, enabling data cleaning and manipulation. We clean the data by selecting only the 17 columns as listed previously, checking for misclassified datatypes and renaming columns and values to ease interpretation of the data.

Without further ado, let's get started!

4.0 PROCESSING THE DATA

In this section, we will process and clean the data before initiating the exploratory data analysis.

This project is created in a colab notebook and exported to fastpages for improved readability and interactive features such as the code button provided bellow. You will find these buttons throughout this paper, however some code snippets has been hidden entirely.

If you wish to review the full line of code, please see the buttons under the headline of this post.

4.1 Install and Import Libraries

import pandas as pd #used to store and manage the data
import numpy as np
import matplotlib.pyplot as plt #visualization library
import plotly.express as px #visualization library used for geospatial data

#Wordcloud related libraries
from os import path
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

We initiate our data analysis by installing and importing the libraries for our python interpreter. For this project, we use pandas, pyplot, express, WordCloud, STOPWORDS, and ImageColorGenerator. These libraries enables us to clean and process the data and later to create meaningful visualizations.

4.2 Gathering the Data

#Create DataFrames
listings = pd.read_csv('http://data.insideairbnb.com/denmark/hovedstaden/copenhagen/2020-11-28/data/listings.csv.gz', compression='gzip')
listings = listings.iloc[:,[0,4,5,6,27,29,30,31,32,33,36,37,38,39,55,59,60]]

The dataset is downloaded, unzipped and stored in a dataframe using pandas. Again using pandas, the irrelevant columns are dropped, thereby only the columns chosen for this project remains.

Now that, we have gathered the data, let us take a quick glimpse on the data:

listings.head(3)
id name description neighborhood_overview neighbourhood_cleansed latitude longitude property_type room_type accommodates bedrooms beds amenities price number_of_reviews last_review review_scores_rating
0 6983 Copenhagen 'N Livin' Lovely apartment located in the hip Nørrebro a... Nice bars and cozy cafes just minutes away, ye... Nrrebro 55.68798 12.54571 Private room in apartment Private room 2 1.0 1.0 ["Hot water", "Refrigerator", "Heating", "Stov... $361.00 168 2019-07-19 96.0
1 26057 Lovely house - most attractive area Our lovely house in the center of the city is ... The neighborhood is the most famous one and th... Indre By 55.69163 12.57459 Entire house Entire home/apt 6 4.0 4.0 ["Kitchen", "Essentials", "Cooking basics", "I... $2,400.00 50 2019-12-14 98.0
2 29118 Best Location in Cool Istedgade <b>The space</b><br />The apartment is situate... NaN Vesterbro-Kongens Enghave 55.67069 12.55430 Entire apartment Entire home/apt 2 1.0 1.0 ["Hot water", "Changing table", "Refrigerator"... $725.00 22 2019-08-02 98.0
listings.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7337 entries, 0 to 8601
Data columns (total 17 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   listing_id              7337 non-null   int64  
 1   listing_name            7336 non-null   object 
 2   listing_description     7215 non-null   object 
 3   neighborhood_overview   4419 non-null   object 
 4   neighbourhood_cleansed  7337 non-null   object 
 5   latitude                7337 non-null   float64
 6   longitude               7337 non-null   float64
 7   property_type           7337 non-null   object 
 8   room_type               7337 non-null   object 
 9   accommodates            7337 non-null   int64  
 10  bedrooms                7202 non-null   float64
 11  beds                    7321 non-null   float64
 12  amenities               7337 non-null   object 
 13  price                   7337 non-null   object 
 14  number_of_reviews       7337 non-null   int64  
 15  last_review             7337 non-null   object 
 16  review_scores_rating    7230 non-null   float64
dtypes: float64(5), int64(3), object(9)
memory usage: 1.0+ MB
listings.isnull().sum()
id                           0
name                         1
description                240
neighborhood_overview     3700
neighbourhood_cleansed       0
latitude                     0
longitude                    0
property_type                0
room_type                    0
accommodates                 0
bedrooms                   196
beds                        52
amenities                    0
price                        0
number_of_reviews            0
last_review               1299
review_scores_rating      1406
dtype: int64
listings.shape
(8636, 17)

From the quick glimpse, we see that not all hosts are providing a description of the neighbourhood that the listing is located at (the values appear as missing (NaN). In fact, 3700 listings appear without a description of the neighbourhood.

We find that, in the cleansed neighbourhood attribute letter 'ø' has been removed from the name.

The Price attribute is listed as USD marked by a '$', however the listing price should be denoted as DKK. Moreover, the price is interpreted as an object instead of an integer-value. Similarly, 'bedrooms' and 'beds' are interpreted as float types instead of intergers. Additionally, the last_review attribute should be classified as a datatime type.

4.3 Data Cleaning

Lets try to correct the above mentioned data anomali.

We start by removing renaming the columns, and then the neighbourhood names to be easier interpretable. To correct the datatypes, some actions has to happen beforehand. For the pricing, the '$' and ',' is removed before correcting the data type. For beds and bedrooms, the missing values are filled with a zero, under the assumption, that these listings are available but does not offer either a bed or a bedroom.

#Rename columns
listings.rename(columns={'id':'listing_id','name':'listing_name','description':'listing_description', 'neighbourhood_overview':'neighbourhood_description'},inplace=True)

#Rename Neighbourhood Values
listings['neighbourhood_cleansed'] = listings.neighbourhood_cleansed.replace(
        {
        'Nrrebro':'Nørrebro',
         'sterbro':'Østerbro',
         'Amager st': 'Amager Øst',
         'Vanlse': 'Vanløse',
         'Brnshj-Husum':'Brønshøj-Husum'
        }
    )

#Fill missing values
cols = ['beds','bedrooms']
listings[cols] = listings[cols].fillna('0')

#Correct DataTypes
listings = listings.astype(
    {
     'bedrooms':int,
     'beds':int,
     'last_review':'datetime64[ns]'
     }
    )

#Correct Prices from $ to DKK, then DataType
listings.price = listings.price.str.replace(',','')
listings.price = listings.price.str.replace('$','')
listings.price = listings.price.astype(float)

clean = listings

We should now have clean data, that we can use to analyze the attributes of the listings. Here, we will investigate the distributions of prices, neighbourhoods, property types, and room types. Similarly, we will investigate the average prices of listings by neighbourhood, property type, and by room type. In conclusion, we aim to list differnces that occour for each different category.

Nearing the end, the two datasets are merged into one dataframe, that contains data about both listings and reviews. These are joined by the listings_id using the 'inner' property. We can then use this dataframe to display the interactive map, and wordclouds that summarized the reviews of each neighbourhood.

5.0 EXPLORATORY DATA PROCESSING

Distributions

Neighbourhoods

neighbourhoods = listings['neighbourhood_cleansed'].value_counts().to_frame(name='listings').reset_index()
neighbourhoods
index listings
0 Indre By 1426
1 Vesterbro-Kongens Enghave 1169
2 Nrrebro 1119
3 Frederiksberg 817
4 sterbro 777
5 Amager Vest 640
6 Amager st 546
7 Valby 267
8 Bispebjerg 261
9 Vanlse 181
10 Brnshj-Husum 134

Property Types

listings[['property_type']].value_counts().to_frame(name='listings').reset_index()
property_type listings
0 Entire apartment 4905
1 Private room in apartment 916
2 Entire condominium 412
3 Entire house 338
4 Entire townhouse 155
5 Entire serviced apartment 153
6 Private room in house 89
7 Entire loft 88
8 Private room in condominium 54
9 Private room in townhouse 39
10 Entire villa 31
11 Private room in villa 21
12 Room in hostel 17
13 Houseboat 12
14 Private room in bed and breakfast 11
15 Entire guesthouse 9
16 Entire guest suite 9
17 Private room in hostel 8
18 Shared room in apartment 8
19 Private room in loft 7
20 Boat 7
21 Private room in guesthouse 6
22 Private room in guest suite 6
23 Room in serviced apartment 5
24 Tiny house 4
25 Shared room in hostel 4
26 Private room in bungalow 3
27 Room in boutique hotel 3
28 Entire cabin 3
29 Entire bungalow 3
30 Room in hotel 3
31 Private room in tiny house 2
32 Private room in boat 2
33 Hut 1
34 Island 1
35 Private room 1
36 Private room in serviced apartment 1

Room Types

listings[['room_type']].value_counts().to_frame(name='listings').reset_index()
room_type listings
0 Entire home/apt 6131
1 Private room 1169
2 Hotel room 25
3 Shared room 12

Property Types by Neighbourhood

#Select properties listed more than 400 times
listings_clean = listings[listings.property_type.isin(['Entire apartment','Private room in apartment','Entire condominium','Entire house'])]

#Count number of listings in neighbourhoods by property type   
listings_byNeighbourhood = listings_clean.groupby(['neighbourhood_cleansed','property_type']).neighbourhood_cleansed.count().to_frame(name = 'listings').reset_index()

#Sum number of listings per neighbourhood
listingsNeighbourhoodCount = listings_byNeighbourhood.groupby('neighbourhood_cleansed')['listings'].sum().to_frame(name = 'total_listings').sort_values(by='total_listings', ascending=False).reset_index()

#Calculate ratio of property types in the different neighbourhoods
neighbourhoodPropertyRatio = listings_byNeighbourhood.merge(listingsNeighbourhoodCount, on='neighbourhood_cleansed')
neighbourhoodPropertyRatio['ratio_of_property_type_in_neighbourhood'] = neighbourhoodPropertyRatio['listings']/neighbourhoodPropertyRatio['total_listings']*100
neighbourhoodPropertyRatio.head(10)
neighbourhood_cleansed property_type listings total_listings ratio_of_property_type_in_neighbourhood
0 Amager Vest Entire apartment 373 551 67.695100
1 Amager Vest Entire condominium 37 551 6.715064
2 Amager Vest Entire house 62 551 11.252269
3 Amager Vest Private room in apartment 79 551 14.337568
4 Amager st Entire apartment 301 466 64.592275
5 Amager st Entire condominium 36 466 7.725322
6 Amager st Entire house 59 466 12.660944
7 Amager st Private room in apartment 70 466 15.021459
8 Bispebjerg Entire apartment 168 246 68.292683
9 Bispebjerg Entire condominium 18 246 7.317073

Room Types by Neighbourhood

#Count number of listings in neighbourhoods by property type   
roomCount = listings.groupby(['neighbourhood_cleansed','room_type']).neighbourhood_cleansed.count().to_frame(name = 'listings').reset_index()

#Sum number of listings per neighbourhood
roomNeighbourhoodCount = roomCount.groupby('neighbourhood_cleansed')['listings'].sum().to_frame(name = 'total_listings').sort_values(by='total_listings', ascending=False).reset_index()

#Calculate ratio of property types in the different neighbourhoods
roomRatio = roomCount.merge(listingsNeighbourhoodCount, on='neighbourhood_cleansed')
roomRatio['ratio_of_room_type_in_neighbourhood'] = roomRatio['listings']/roomRatio['total_listings']*100
roomRatio.head(10)
neighbourhood_cleansed room_type listings total_listings ratio_of_room_type_in_neighbourhood
0 Amager Vest Entire home/apt 529 551 96.007260
1 Amager Vest Hotel room 1 551 0.181488
2 Amager Vest Private room 108 551 19.600726
3 Amager Vest Shared room 2 551 0.362976
4 Amager st Entire home/apt 434 466 93.133047
5 Amager st Hotel room 3 466 0.643777
6 Amager st Private room 108 466 23.175966
7 Amager st Shared room 1 466 0.214592
8 Bispebjerg Entire home/apt 206 246 83.739837
9 Bispebjerg Private room 55 246 22.357724

Accomodations by Neighbourhood

df = pd.DataFrame()
df['avg_n_accommodations'] =listings.groupby('neighbourhood_cleansed').accommodates.mean()
df = df.reset_index()
df
neighbourhood_cleansed avg_n_accommodations
0 Amager Vest 3.668750
1 Amager st 3.483516
2 Bispebjerg 3.249042
3 Brnshj-Husum 4.589552
4 Frederiksberg 3.470012
5 Indre By 3.816971
6 Nrrebro 3.102770
7 Valby 3.550562
8 Vanlse 3.872928
9 Vesterbro-Kongens Enghave 3.230967
10 sterbro 3.464607

Pricing by Neighbourhood

neighbourhoodPricing = listings.groupby('neighbourhood_cleansed').price.mean().to_frame().sort_values(by='price', ascending=False).reset_index()
neighbourhoodPricing
neighbourhood_cleansed price
0 Indre By 1532.850631
1 sterbro 1039.018018
2 Amager st 1020.058608
3 Vesterbro-Kongens Enghave 1017.957228
4 Frederiksberg 1002.586291
5 Amager Vest 956.114063
6 Nrrebro 874.478999
7 Brnshj-Husum 809.671642
8 Valby 753.269663
9 Vanlse 752.756906
10 Bispebjerg 674.777778
#Calculate Average Price Per Person
df['price'] = neighbourhoodPricing.price
df['price_perPerson'] = df.price/df.avg_n_accommodations
df
neighbourhood_cleansed avg_n_accommodations price price_ratio price_perPerson
0 Amager Vest 3.668750 1532.850631 417.812779 417.812779
1 Amager st 3.483516 1039.018018 298.267002 298.267002
2 Bispebjerg 3.249042 1020.058608 313.956718 313.956718
3 Brnshj-Husum 4.589552 1017.957228 221.798811 221.798811
4 Frederiksberg 3.470012 1002.586291 288.928748 288.928748
5 Indre By 3.816971 956.114063 250.490291 250.490291
6 Nrrebro 3.102770 874.478999 281.838134 281.838134
7 Valby 3.550562 809.671642 228.040431 228.040431
8 Vanlse 3.872928 753.269663 194.496161 194.496161
9 Vesterbro-Kongens Enghave 3.230967 752.756906 232.981949 232.981949
10 sterbro 3.464607 674.777778 194.763125 194.763125

Pricing by Property Type

PropertyPricing = listings.groupby('property_type').price.mean().to_frame().sort_values(by='price', ascending=False).reset_index()
PropertyPricing.head(20)
property_type price
0 Boat 1839.714286
1 Entire villa 1800.774194
2 Houseboat 1636.083333
3 Island 1600.000000
4 Private room 1500.000000
5 Entire loft 1488.738636
6 Entire townhouse 1476.600000
7 Entire serviced apartment 1459.496732
8 Entire house 1371.062130
9 Room in serviced apartment 1235.400000
10 Entire condominium 1161.533981
11 Private room in tiny house 1123.500000
12 Entire apartment 1104.796330
13 Entire cabin 1022.000000
14 Entire bungalow 1009.666667
15 Room in hotel 921.333333
16 Room in hostel 894.529412
17 Room in boutique hotel 872.666667
18 Hut 850.000000
19 Entire guest suite 786.222222

Pricing by Room Type

roomPricing = listings.groupby('room_type').price.mean().to_frame().sort_values(by='price', ascending=False).reset_index()
roomPricing.head(20)
room_type price
0 Entire home/apt 1151.074539
1 Hotel room 960.080000
2 Shared room 618.583333
3 Private room 584.256630

Visualizations

Maps

#Merge reviews and listings
group_listingReviews = reviews.merge(listings, on='listing_id', how='inner')

#Define mapbox API token and style
mapbox_access_token = 'pk.eyJ1IjoiYWNodG9uMjExMSIsImEiOiJja2lyam5yemgyNTV0MnJsYmJ0NXdzNWRxIn0.rWJgur27hJnWoBt7Oq5LeQ'
px.set_mapbox_access_token(mapbox_access_token)
plot_style = 'mapbox://styles/achton2111/ckirsv5df0aj01at4zp0d7f3w'

#Interactive Geospacial plot
fig = px.scatter_mapbox(group_listingReviews,
                        lat="latitude",
                        lon="longitude",
                        color="neighbourhood_cleansed",
                        zoom=10,
                        size='price',
                        mapbox_style= plot_style,
                        hover_name='listing_name',
                        hover_data = {'price',
                                      'property_type',
                                      'room_type',
                                      'accommodates',
                                      'beds',
                                      'review_scores_rating'},
                        opacity = 0.8,
                        title = 'AirBnB Listing Locations. Coloured by Neighbourhood, Size by Price)'
                        )
fig.show()

WordClouds

Let's try to see if there are any visual differences between reviews in the different neighbourhoods

#Distinction between neighbourhoods
norrebro = group_listingReviews[group_listingReviews.neighbourhood_cleansed == 'Nrrebro']
indreby = group_listingReviews[group_listingReviews.neighbourhood_cleansed == 'Indre By']
vesterbro_KgsEnghave = group_listingReviews[group_listingReviews.neighbourhood_cleansed == 'Vesterbro-Kongens Enghave']
osterbro = group_listingReviews[group_listingReviews.neighbourhood_cleansed == 'sterbro']
frederiksberg = group_listingReviews[group_listingReviews.neighbourhood_cleansed == 'Frederiksberg']
amagerOst = group_listingReviews[group_listingReviews.neighbourhood_cleansed == 'Amager st']
amagerVest = group_listingReviews[group_listingReviews.neighbourhood_cleansed == 'Amager Vest']
valby = group_listingReviews[group_listingReviews.neighbourhood_cleansed == 'Valby']
bispebjerg = group_listingReviews[group_listingReviews.neighbourhood_cleansed == 'Bispebjerg']
vanlose = group_listingReviews[group_listingReviews.neighbourhood_cleansed == 'Vanlse']
bronshojHusum = group_listingReviews[group_listingReviews.neighbourhood_cleansed == 'Brnshj-Husum']

Norrebro

# Iterating through the .csv data file 
for i in norrebro.review_text: 
    i = str(i) 
    separate = i.split() 
    for j in range(len(separate)): 
        separate[j] = separate[j].lower() 
    comment_words += " ".join(separate)+" "

# Creating the Word Cloud
final_wordcloud = WordCloud(width = 3000, height = 2000, 
                background_color ='black', 
                stopwords = STOPWORDS, 
                min_font_size = 10).generate(comment_words)

# Displaying the WordCloud                    
plt.figure(facecolor = None) 
plt.imshow(final_wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 
  
plt.show()

Indre By

# Iterating through the .csv data file 
for i in indreby.review_text: 
    i = str(i) 
    separate = i.split() 
    for j in range(len(separate)): 
        separate[j] = separate[j].lower() 
    comment_words += " ".join(separate)+" "

# Creating the Word Cloud
final_wordcloud = WordCloud(width = 3000, height = 2000, 
                background_color ='black', 
                stopwords = STOPWORDS, 
                min_font_size = 10).generate(comment_words)

# Displaying the WordCloud                    
plt.figure(facecolor = None) 
plt.imshow(final_wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 
  
plt.show()

Vesterbro - Kongens Enghave

# Iterating through the .csv data file 
for i in vesterbro_KgsEnghave.review_text: 
    i = str(i) 
    separate = i.split() 
    for j in range(len(separate)): 
        separate[j] = separate[j].lower() 
    comment_words += " ".join(separate)+" "

# Creating the Word Cloud
final_wordcloud = WordCloud(width = 3000, height = 2000, 
                background_color ='black', 
                stopwords = STOPWORDS, 
                min_font_size = 10).generate(comment_words)

# Displaying the WordCloud                    
plt.figure(facecolor = None) 
plt.imshow(final_wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 
  
plt.show()

Østerbro

# Iterating through the .csv data file 
for i in osterbro.review_text: 
    i = str(i) 
    separate = i.split() 
    for j in range(len(separate)): 
        separate[j] = separate[j].lower() 
    comment_words += " ".join(separate)+" "

# Creating the Word Cloud
final_wordcloud = WordCloud(width = 3000, height = 2000, 
                background_color ='black', 
                stopwords = STOPWORDS, 
                min_font_size = 10).generate(comment_words)

# Displaying the WordCloud                    
plt.figure(facecolor = None) 
plt.imshow(final_wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 
  
plt.show()

Frederiksberg

# Iterating through the .csv data file 
for i in frederiksberg.review_text: 
    i = str(i) 
    separate = i.split() 
    for j in range(len(separate)): 
        separate[j] = separate[j].lower() 
    comment_words += " ".join(separate)+" "

# Creating the Word Cloud
final_wordcloud = WordCloud(width = 3000, height = 2000, 
                background_color ='black', 
                stopwords = STOPWORDS, 
                min_font_size = 10).generate(comment_words)

# Displaying the WordCloud                    
plt.figure(facecolor = None) 
plt.imshow(final_wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 
  
plt.show()

Amager Øst

# Iterating through the .csv data file 
for i in amagerOst.review_text: 
    i = str(i) 
    separate = i.split() 
    for j in range(len(separate)): 
        separate[j] = separate[j].lower() 
    comment_words += " ".join(separate)+" "

# Creating the Word Cloud
final_wordcloud = WordCloud(width = 3000, height = 2000, 
                background_color ='black', 
                stopwords = STOPWORDS, 
                min_font_size = 10).generate(comment_words)

# Displaying the WordCloud                    
plt.figure(facecolor = None) 
plt.imshow(final_wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 
  
plt.show()

Amager Vest

# Iterating through the .csv data file 
for i in amagerVest.review_text: 
    i = str(i) 
    separate = i.split() 
    for j in range(len(separate)): 
        separate[j] = separate[j].lower() 
    comment_words += " ".join(separate)+" "

# Creating the Word Cloud
final_wordcloud = WordCloud(width = 3000, height = 2000, 
                background_color ='black', 
                stopwords = STOPWORDS, 
                min_font_size = 10).generate(comment_words)

# Displaying the WordCloud                    
plt.figure(facecolor = None) 
plt.imshow(final_wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 
  
plt.show()

Valby

# Iterating through the .csv data file 
for i in valby.review_text: 
    i = str(i) 
    separate = i.split() 
    for j in range(len(separate)): 
        separate[j] = separate[j].lower() 
    comment_words += " ".join(separate)+" "

# Creating the Word Cloud
final_wordcloud = WordCloud(width = 3000, height = 2000, 
                background_color ='black', 
                stopwords = STOPWORDS, 
                min_font_size = 10).generate(comment_words)

# Displaying the WordCloud                    
plt.figure(facecolor = None) 
plt.imshow(final_wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 
  
plt.show()

Bispebjerg

# Iterating through the .csv data file 
for i in bispebjerg.review_text: 
    i = str(i) 
    separate = i.split() 
    for j in range(len(separate)): 
        separate[j] = separate[j].lower() 
    comment_words += " ".join(separate)+" "

# Creating the Word Cloud
final_wordcloud = WordCloud(width = 3000, height = 2000, 
                background_color ='black', 
                stopwords = STOPWORDS, 
                min_font_size = 10).generate(comment_words)

# Displaying the WordCloud                    
plt.figure(facecolor = None) 
plt.imshow(final_wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 
  
plt.show()

Vanløse

# Iterating through the .csv data file 
for i in vanlose.review_text: 
    i = str(i) 
    separate = i.split() 
    for j in range(len(separate)): 
        separate[j] = separate[j].lower() 
    comment_words += " ".join(separate)+" "

# Creating the Word Cloud
final_wordcloud = WordCloud(width = 3000, height = 2000, 
                background_color ='black', 
                stopwords = STOPWORDS, 
                min_font_size = 10).generate(comment_words)

# Displaying the WordCloud                    
plt.figure(facecolor = None) 
plt.imshow(final_wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 
  
plt.show()

Brønshøj - Husum

# Iterating through the .csv data file 
for i in bronshojHusum.review_text: 
    i = str(i) 
    separate = i.split() 
    for j in range(len(separate)): 
        separate[j] = separate[j].lower() 
    comment_words += " ".join(separate)+" "

# Creating the Word Cloud
final_wordcloud = WordCloud(width = 3000, height = 2000, 
                background_color ='black', 
                stopwords = STOPWORDS, 
                min_font_size = 10).generate(comment_words)

# Displaying the WordCloud                    
plt.figure(facecolor = None) 
plt.imshow(final_wordcloud) 
plt.axis("off") 
plt.tight_layout(pad = 0) 
  
plt.show()

4.0 RESULTS

5.0 DISCUSSION

6.0 CONCLUSION